Dialogue State Tracking (DST), a key component of task-oriented conversation systems, represents user intentions by determining the values of pre-defined slots in an ongoing dialogue. Existing approaches use hand-crafted templates and additional slot information to fine-tune and prompt large pre-trained language models and elicit slot values from the dialogue context. Significant manual effort and domain knowledge is required to design effective prompts, limiting the generalizability of these approaches to new domains and tasks. In this work, we propose DiSTRICT, a generalizable in-context tuning approach for DST that retrieves highly relevant training examples for a given dialogue to fine-tune the model without any hand-crafted templates. Experiments with the MultiWOZ benchmark datasets show that DiSTRICT outperforms existing approaches in various zero-shot and few-shot settings using a much smaller model, thereby providing an important advantage for real-world deployments that often have limited resource availability.
translated by 谷歌翻译
The inception of large language models has helped advance state-of-the-art performance on numerous natural language tasks. This has also opened the door for the development of foundation models for other domains and data modalities such as images, code, and music. In this paper, we argue that business process data representations have unique characteristics that warrant the development of a new class of foundation models to handle tasks like process mining, optimization, and decision making. These models should also tackle the unique challenges of applying AI to business processes which include data scarcity, multi-modal representations, domain specific terminology, and privacy concerns.
translated by 谷歌翻译
Lifelong learning aims to create AI systems that continuously and incrementally learn during a lifetime, similar to biological learning. Attempts so far have met problems, including catastrophic forgetting, interference among tasks, and the inability to exploit previous knowledge. While considerable research has focused on learning multiple input distributions, typically in classification, lifelong reinforcement learning (LRL) must also deal with variations in the state and transition distributions, and in the reward functions. Modulating masks, recently developed for classification, are particularly suitable to deal with such a large spectrum of task variations. In this paper, we adapted modulating masks to work with deep LRL, specifically PPO and IMPALA agents. The comparison with LRL baselines in both discrete and continuous RL tasks shows competitive performance. We further investigated the use of a linear combination of previously learned masks to exploit previous knowledge when learning new tasks: not only is learning faster, the algorithm solves tasks that we could not otherwise solve from scratch due to extremely sparse rewards. The results suggest that RL with modulating masks is a promising approach to lifelong learning, to the composition of knowledge to learn increasingly complex tasks, and to knowledge reuse for efficient and faster learning.
translated by 谷歌翻译
In this paper, a complete framework for Autonomous Self Driving is implemented. LIDAR, Camera and IMU sensors are used together. The entire data communication is managed using Robot Operating System which provides a robust platform for implementation of Robotics Projects. Jetson Nano is used to provide powerful on-board processing capabilities. Sensor fusion is performed on the data received from the different sensors to improve the accuracy of the decision making and inferences that we derive from the data. This data is then used to create a localized map of the environment. In this step, the position of the vehicle is obtained with respect to the Mapping done using the sensor data.The different SLAM techniques used for this purpose are Hector Mapping and GMapping which are widely used mapping techniques in ROS. Apart from SLAM that primarily uses LIDAR data, Visual Odometry is implemented using a Monocular Camera. The sensor fused data is then used by Adaptive Monte Carlo Localization for car localization. Using the localized map developed, Path Planning techniques like "TEB planner" and "Dynamic Window Approach" are implemented for autonomous navigation of the vehicle. The last step in the Project is the implantation of Control which is the final decision making block in the pipeline that gives speed and steering data for the navigation that is compatible with Ackermann Kinematics. The implementation of such a control block under a ROS framework using the three sensors, viz, LIDAR, Camera and IMU is a novel approach that is undertaken in this project.
translated by 谷歌翻译
As the demand for autonomous driving increases, it is paramount to ensure safety. Early accident prediction using deep learning methods for driving safety has recently gained much attention. In this task, early accident prediction and a point prediction of where the drivers should look are determined, with the dashcam video as input. We propose to exploit the double actors and regularized critics (DARC) method, for the first time, on this accident forecasting platform. We derive inspiration from DARC since it is currently a state-of-the-art reinforcement learning (RL) model on continuous action space suitable for accident anticipation. Results show that by utilizing DARC, we can make predictions 5\% earlier on average while improving in multiple metrics of precision compared to existing methods. The results imply that using our RL-based problem formulation could significantly increase the safety of autonomous driving.
translated by 谷歌翻译
Multi-Task Learning (MTL) has shown its importance at user products for fast training, data efficiency, reduced overfitting etc. MTL achieves it by sharing the network parameters and training a network for multiple tasks simultaneously. However, MTL does not provide the solution, if each task needs training from a different dataset. In order to solve the stated problem, we have proposed an architecture named TreeDNN along with it's training methodology. TreeDNN helps in training the model with multiple datasets simultaneously, where each branch of the tree may need a different training dataset. We have shown in the results that TreeDNN provides competitive performance with the advantage of reduced ROM requirement for parameter storage and increased responsiveness of the system by loading only specific branch at inference time.
translated by 谷歌翻译
Deep learning based text-to-speech (TTS) systems have been evolving rapidly with advances in model architectures, training methodologies, and generalization across speakers and languages. However, these advances have not been thoroughly investigated for Indian language speech synthesis. Such investigation is computationally expensive given the number and diversity of Indian languages, relatively lower resource availability, and the diverse set of advances in neural TTS that remain untested. In this paper, we evaluate the choice of acoustic models, vocoders, supplementary loss functions, training schedules, and speaker and language diversity for Dravidian and Indo-Aryan languages. Based on this, we identify monolingual models with FastPitch and HiFi-GAN V1, trained jointly on male and female speakers to perform the best. With this setup, we train and evaluate TTS models for 13 languages and find our models to significantly improve upon existing models in all languages as measured by mean opinion scores. We open-source all models on the Bhashini platform.
translated by 谷歌翻译
Harnessing the benefits of drones for urban innovation at scale requires reliable aerial autonomy. One major barrier to advancing aerial autonomy has been collecting large-scale aerial datasets for training machine learning models. Due to costly and time-consuming real-world data collection through deploying drones, there has been an increasing shift towards using synthetic data for training models in drone applications. However, to increase generalizability of trained policies on synthetic data, incorporating domain randomization into the data generation workflow for addressing the sim-to-real problem becomes crucial. Current synthetic data generation tools either lack domain randomization or rely heavily on manual workload or real samples for configuring and generating diverse realistic simulation scenes. These dependencies limit scalability of the data generation workflow. Accordingly, there is a major challenge in balancing generalizability and scalability in synthetic data generation. To address these gaps, we introduce a modular scalable data generation workflow tailored to aerial autonomy applications. To generate realistic configurations of simulation scenes while increasing diversity, we present an adaptive layered domain randomization approach that creates a type-agnostic distribution space for assets over the base map of the environments before pose generation for drone trajectory. We leverage high-level scene structures to automatically place assets in valid configurations and then extend the diversity through obstacle generation and global parameter randomization. We demonstrate the effectiveness of our method in automatically generating diverse configurations and datasets and show its potential for downstream performance optimization. Our work contributes to generating enhanced benchmark datasets for training models that can generalize better to real-world situations.
translated by 谷歌翻译
自动情绪识别(ER)最近由于其在许多实际应用中的潜力而引起了很多兴趣。在这种情况下,已经证明多模式方法可以通过结合多样化和互补的信息来源,从而提高性能(超过单峰方法),从而为嘈杂和缺失的方式提供了一些鲁棒性。在本文中,我们根据从视频中提取的面部和声音方式融合的尺寸ER专注于尺寸,其中探索了互补的视听(A-V)关系,以预测个人在价值空间中的情绪状态。大多数最先进的融合技术都依赖于反复的网络或常规的注意机制,这些机制无法有效利用A-V模式的互补性。为了解决这个问题,我们引入了A-V融合的联合跨注意模型,该模型在A-V模态上提取显着特征,从而可以有效利用模式间关系,同时保留模式内关系。特别是,它根据联合特征表示与单个模式的相关性计算交叉意义权重。通过将联合A-V特征表示形式部署到交叉意见模块中,它有助于同时利用内模式和模态关系,从而显着改善系统的性能,而不是香草交叉意见模块。我们提出的方法的有效性是在Recola和AffWild2数据集的挑战性视频中通过实验验证的。结果表明,我们的跨注意A-V融合模型提供了一种具有成本效益的解决方案,即使模式是嘈杂或不存在的,也可以超越最先进的方法。
translated by 谷歌翻译
数字化和自动化方面的快速进步导致医疗保健的加速增长,从而产生了新型模型,这些模型正在创造新的渠道,以降低成本。 Metaverse是一项在数字空间中的新兴技术,在医疗保健方面具有巨大的潜力,为患者和医生带来了现实的经验。荟萃分析是多种促成技术的汇合,例如人工智能,虚拟现实,增强现实,医疗设备,机器人技术,量子计算等。通过哪些方向可以探索提供优质医疗保健治疗和服务的新方向。这些技术的合并确保了身临其境,亲密和个性化的患者护理。它还提供自适应智能解决方案,以消除医疗保健提供者和接收器之间的障碍。本文对医疗保健的荟萃分析提供了全面的综述,强调了最新技术的状态,即采用医疗保健元元的能力技术,潜在的应用程序和相关项目。还确定了用于医疗保健应用的元元改编的问题,并强调了合理的解决方案作为未来研究方向的一部分。
translated by 谷歌翻译